1 research outputs found
2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth Estimation
Stereo correspondence matching is an essential part of the multi-step stereo
depth estimation process. This paper revisits the depth estimation problem,
avoiding the explicit stereo matching step using a simple two-tower
convolutional neural network. The proposed algorithm is entitled as 2T-UNet.
The idea behind 2T-UNet is to replace cost volume construction with twin
convolution towers. These towers have an allowance for different weights
between them. Additionally, the input for twin encoders in 2T-UNet are
different compared to the existing stereo methods. Generally, a stereo network
takes a right and left image pair as input to determine the scene geometry.
However, in the 2T-UNet model, the right stereo image is taken as one input and
the left stereo image along with its monocular depth clue information, is taken
as the other input. Depth clues provide complementary suggestions that help
enhance the quality of predicted scene geometry. The 2T-UNet surpasses
state-of-the-art monocular and stereo depth estimation methods on the
challenging Scene flow dataset, both quantitatively and qualitatively. The
architecture performs incredibly well on complex natural scenes, highlighting
its usefulness for various real-time applications. Pretrained weights and code
will be made readily available